AITopics | safety knowledge

Collaborating Authors

safety knowledge

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Why companies don't share AV crash data – and how they could

RobohubDec-1-2025, 11:08:36 GMT

Autonomous vehicles (AVs) have been tested as taxis for decades in San Francisco, Pittsburgh and around the world, and trucking companies have enormous incentives to adopt them. But AV companies rarely share the crash-and safety-related data that is crucial to improving the safety of their vehicles - mostly because they have little incentive to do so. Is AV safety data an auto company's intellectual asset or a public good? It can be both - with a little tweaking, according to a team of Cornell researchers. The team has created a roadmap outlining the barriers and opportunities to encourage AV companies to share the data to make AVs safer, from untangling public versus private data knowledge, to regulations to creating incentive programs.

artificial intelligence, knowledge, robot talk podcast, (13 more...)

Robohub

Country:

North America > United States > California > San Francisco County > San Francisco (0.25)
North America > United States > Oregon (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)

Industry:

Automobiles & Trucks > Manufacturer (0.70)
Transportation > Ground > Road (0.35)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.50)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.35)

Add feedback

Towards Evaluating Proactive Risk Awareness of Multimodal Language Models

Yuan, Youliang, Jiao, Wenxiang, Xie, Yuejin, Shen, Chihao, Tian, Menghan, Wang, Wenxuan, Huang, Jen-tse, He, Pinjia

arXiv.org Artificial IntelligenceOct-21-2025

Human safety awareness gaps often prevent the timely recognition of everyday risks. In solving this problem, a proactive safety artificial intelligence (AI) system would work better than a reactive one. Instead of just reacting to users' questions, it would actively watch people's behavior and their environment to detect potential dangers in advance. Our Proactive Safety Bench (PaSBench) evaluates this capability through 416 multimodal scenarios (128 image sequences, 288 text logs) spanning 5 safety-critical domains. Evaluation of 36 advanced models reveals fundamental limitations: Top performers like Gemini-2.5-pro achieve 71% image and 64% text accuracy, but miss 45-55% risks in repeated trials. Through failure analysis, we identify unstable proactive reasoning rather than knowledge deficits as the primary limitation. This work establishes (1) a proactive safety benchmark, (2) systematic evidence of model limitations, and (3) critical directions for developing reliable protective AI. We believe our dataset and findings can promote the development of safer AI assistants that actively prevent harm rather than merely respond to requests. Our dataset can be found at https://huggingface.co/datasets/Youliang/PaSBench.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.17455

Country:

North America > United States (0.28)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > Singapore (0.04)
(2 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Transportation (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Consumer Health (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Rethinking Safety in LLM Fine-tuning: An Optimization Perspective

Kim, Minseon, Kwak, Jin Myung, Alssum, Lama, Ghanem, Bernard, Torr, Philip, Krueger, David, Barez, Fazl, Bibi, Adel

arXiv.org Artificial IntelligenceAug-19-2025

Fine-tuning language models is commonly believed to inevitably harm their safety, i.e., refusing to respond to harmful user requests, even when using harmless datasets, thus requiring additional safety measures. We challenge this belief through systematic testing, showing that poor optimization choices, rather than inherent trade-offs, often cause safety problems, measured as harmful responses to adversarial prompts. By properly selecting key training hyper-parameters, e.g., learning rate, batch size, and gradient steps, we reduce unsafe model responses from 16\% to approximately 5\%, as measured by keyword matching, while maintaining utility performance. Based on this observation, we propose a simple exponential moving average (EMA) momentum technique in parameter space that preserves safety performance by creating a stable optimization path and retains the original pre-trained model's safety properties. Our experiments on the Llama families across multiple datasets (Dolly, Alpaca, ORCA) demonstrate that safety problems during fine-tuning can largely be avoided without specialized interventions, outperforming existing approaches that require additional safety data while offering practical guidelines for maintaining both model performance and safety during adaptation.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.12531

Country: North America > United States (0.67)

Genre: Research Report > New Finding (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

R1-ACT: Efficient Reasoning Model Safety Alignment by Activating Safety Knowledge

In, Yeonjun, Kim, Wonjoong, Park, Sangwu, Park, Chanyoung

arXiv.org Artificial IntelligenceAug-4-2025

Although large reasoning models (LRMs) have demonstrated impressive capabilities on complex tasks, recent studies reveal that these models frequently fulfill harmful user instructions, raising significant safety concerns. In this paper, we investigate the underlying cause of LRM safety risks and find that models already possess sufficient safety knowledge but fail to activate it during reasoning. Based on this insight, we propose R1-Act, a simple and efficient post-training method that explicitly triggers safety knowledge through a structured reasoning process. R1-Act achieves strong safety improvements while preserving reasoning performance, outperforming prior alignment methods. Notably, it requires only 1,000 training examples and 90 minutes of training on a single RTX A6000 GPU. Extensive experiments across multiple LRM backbones and sizes demonstrate the robustness, scalability, and practical efficiency of our approach.

large language model, machine learning, safety knowledge, (16 more...)

arXiv.org Artificial Intelligence

2508.00324

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models

Tan, Yingshui, Zheng, Boren, Zheng, Baihui, Cao, Kerui, Jing, Huiyun, Wei, Jincheng, Liu, Jiaheng, He, Yancheng, Su, Wenbo, Zhu, Xiangyong, Zheng, Bo, Zhang, Kaifu

arXiv.org Artificial IntelligenceDec-23-2024

With the rapid advancement of Large Language Models (LLMs), significant safety concerns have emerged. Fundamentally, the safety of large language models is closely linked to the accuracy, comprehensiveness, and clarity of their understanding of safety knowledge, particularly in domains such as law, policy and ethics. This factuality ability is crucial in determining whether these models can be deployed and applied safely and compliantly within specific regions. To address these challenges and better evaluate the factuality ability of LLMs to answer short questions, we introduce the Chinese SafetyQA benchmark. Chinese SafetyQA has several properties (i.e., Chinese, Diverse, High-quality, Static, Easy-to-evaluate, Safety-related, Harmless). Based on Chinese SafetyQA, we perform a comprehensive evaluation on the factuality abilities of existing LLMs and analyze how these capabilities relate to LLM abilities, e.g., RAG ability and robustness against attacks.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2412.15265

Country:

Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Jiangsu Province > Yancheng (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.64)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

$R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning

Kang, Mintong, Li, Bo

arXiv.org Artificial IntelligenceJul-7-2024

As LLMs become increasingly prevalent across various applications, it is critical to establish safety guardrails to moderate input/output content of LLMs. Existing guardrail models treat various safety categories independently and fail to explicitly capture the intercorrelations among them. This has led to limitations such as ineffectiveness due to inadequate training on long-tail data from correlated safety categories, susceptibility to jailbreaking attacks, and inflexibility regarding new safety categories. To address these limitations, we propose $R^2$-Guard, a robust reasoning enabled LLM guardrail via knowledge-enhanced logical reasoning. Specifically, $R^2$-Guard comprises two parts: data-driven category-specific learning and reasoning components. The data-driven guardrail models provide unsafety probabilities of moderated content on different safety categories. We then encode safety knowledge among different categories as first-order logical rules and embed them into a probabilistic graphic model (PGM) based reasoning component. The unsafety probabilities of different categories from data-driven guardrail models are sent to the reasoning component for final inference. We employ two types of PGMs: Markov logic networks (MLNs) and probabilistic circuits (PCs), and optimize PCs to achieve precision-efficiency balance via improved graph structure. To further perform stress tests for guardrail models, we employ a pairwise construction method to construct a new safety benchmark TwinSafety, which features principled categories. We demonstrate the effectiveness of $R^2$-Guard by comparisons with eight strong guardrail models on six safety benchmarks, and demonstrate the robustness of $R^2$-Guard against four SOTA jailbreaking attacks. $R^2$-Guard significantly surpasses SOTA method LlamaGuard by 30.2% on ToxicChat and by 59.5% against jailbreaking attacks.

arxiv preprint arxiv, category, guardrail model, (14 more...)

arXiv.org Artificial Intelligence

2407.05557

Country:

North America > United States > Illinois (0.04)
Asia > China (0.04)

Genre: Research Report (0.64)

Industry: Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.55)

Add feedback